Concept Mining: A Conceptual Understanding based Approach

نویسنده

  • Shady Shehata
چکیده

Due to the daily rapid growth of the information, there are considerable needs to extract and discover valuable knowledge from data sources such as the World Wide Web. Most of the common techniques in text mining are based on the statistical analysis of a term either word or phrase. These techniques consider documents as bags of words and pay no attention to the meanings of the document content. In addition, statistical analysis of a term frequency captures the importance of the term within a document only. However, two terms can have the same frequency in their documents, but one term contributes more to the meaning of its sentences than the other term. Therefore, there is an intensive need for a model that captures the meaning of linguistic utterances in a formal structure. The underlying model should indicate terms that capture the semantics of text. In this case, the model can capture terms that present the concepts of the sentence, which leads to discover the topic of the document. A new concept-based model that analyzes terms on the sentence, document and corpus levels rather than the traditional analysis of document only is introduced. The concept-based model can effectively discriminate between non-important terms with respect to sentence semantics and terms which hold the concepts that represent the sentence meaning. The proposed model consists of concept-based statistical analyzer, conceptual ontological graph representation, concept extractor and concept-based similarity measure. The term which contributes to the sentence semantics is assigned two different weights by the concept-based statistical analyzer and the conceptual ontological graph representation. These two weights are combined into a new weight. The concepts that have maximum combined weights are selected by the concept extractor. The similarity between documents is calculated based on a new concept-based similarity measure. The proposed similarity measure takes full advantage of using the concept analysis measures on the sentence, document, and corpus levels in calculating the similarity between documents. Large sets of experiments using the proposed concept-based model on different datasets in text clustering, categorization and retrieval are conducted. The experiments demonstrate extensive comparison between traditional weighting and the concept-based weighting obtained by the concept-based model. Experimental results in text clustering, categorization and retrieval demonstrate the substantial enhancement of the quality using: (1) concept-based term frequency (tf), (2) conceptual term frequency (ctf), (3) concept-based statistical analyzer, (4) conceptual ontological graph, (5) concept-based combined model. In text clustering, the evaluation of results is relied on two quality measures, the F-Measure and

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Conceptual Code Mining Mining for Source-Code Regularities with Formal Concept Analysis

Understanding the conceptual structure of large software systems, whether it is for software understanding or reengineering purposes, is a nontrivial task. In particular, knowing where to start the comprehension process is more difficult than it seems, especially when a system is large and complex and time is scarce. We propose an approach to mine a system’s source code automatically and effici...

متن کامل

Concept Mining using Conceptual Ontological Graph (COG)

Concept mining (CM) is the area of exploring and finding links, associations, relationships, and patterns among huge collections of information. In this paper, we propose concept-based text representation, with an emphasis on using the proposed representation in different application s such as information retrieval, text summarization, and question answering. This work presents a new paradigm f...

متن کامل

The Design and Trial of a Learning Environment Based on Model Construction Approach to Instruction Aimed at Improving Concept Learning and Modeling Practices

The Design and Trial of a Learning Environment Based on Model Construction Approach to Instruction Aimed at Improving Concept Learning and Modeling Practices   M. Maaleki* H. FarDaanesh, Ph.D.** E. Talaa’ee, Ph.D.*** J. Haatami, Ph.D.****   Model construction is an integrated approach aimed at a better understanding and acquisition of scientific/epistemological concepts and skills. To tr...

متن کامل

A multilingual text mining approach to web cross-lingual text retrieval

To enable concept-based cross-lingual text retrieval (CLTR) using multilingual text mining, our approach will first discover the multilingual concept–term relationships from linguistically diverse textual data relevant to a domain. Second, the multilingual concept–term relationships, in turn, are used to discover the conceptual content of the multilingual text, which is either a document contai...

متن کامل

The concept of self-control in the family caregivers of patients with chronic disease based on the family-centered empowerment model: A qualitative directed content analysis

Background & Aim: Self-control is the capacity to organize cognitive and emotional responses in order to provide continuous and adaptive behavior with ideal standards for long-term goals. Due to the high levels of care burden of patients with chronic disease, this study aims to explain the concept of self-control in the family caregivers of patients with chronic disease based on the family-cent...

متن کامل

Efficient Web Usage Mining Based on Formal Concept Analysis

Web usage mining attempts to discover useful knowledge from the secondary data obtained from the interactions of the users with the web. Web usage mining has become very critical for effective web site management, creating adaptive web sites, business and support services, personalization and so on. Web usage mining aims to discover interesting user access patterns from web logs. Formal Based C...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009